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The need to assess attendance behavior often arises, at the line- 
management level, when an employee is considered for a transfer or 
a promotion. A sound assessment should, of course, take into account 
the statistical behavior and distributional properties of absenteeism. 
The first part of this paper is a detailed statistical analysis of 
attendance records of a sample of 112 telephone operators. We use 
exploratory and confirmatory statistical techniques to suggest possi- 
ble theoretical models that can parsimoniously describe the behavior 
of the variables of interest. Methodological difficulties that often 
arise in cross- sectional studies and are caused by biased sampling 
are pointed out and treated. We explore the relation between age and 
attendance; in particular it is evident that (for this data set) the 
frequency of "incidental" absences tends to decrease with age, and 
that the duration of "disability" absences tends to increase with age. 
In the second part of the paper we suggest an attendance evaluation 
method based on the statistical analysis of the first part. The method 
is designed to reflect the current-year attendance as well as a longer- 
run attendance behavior, interpreted as a personal characteristic, 
and its properties are demonstrated via examples. 

I. INTRODUCTION AND SUMMARY 

Management policy regarding absenteeism has two major aspects: 
a global one spelled out in the various company rules and applied 
evenly to all employees and a local one, generally less formal, in which 
line management is concerned about individual's attendance. A ques- 
tion like how many "paid days off" per year an employee should be 
allowed for unexpected and unavoidable absences is often a subject for 
union negotiations and is a good example of what we mean by man- 
agement's global policy. On the other hand, the need to decide whether 
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a given operator has exhibited satisfactory attendance arises when 
that operator is considered for a transfer or promotion and is a good 
example of management's local policy. Whether local or global, a sound 
policy should consider the statistical characteristics and the distribu- 
tional properties of absenteeism. 

Section II gives a detailed statistical analysis of absenteeism (on the 
basis of a sample of 112 telephone operators). Such an analysis can 
enhance our understanding of absenteeism, and can be used as a basis 
for answering questions of the type described above. For example, the 
distribution of the duration of incidental absences (Section 2.6) and 
the frequency of incidental absences per year (Fig. 6, or more generally 
Section 2.6), can be used to answer how many paid days off per year 
an employee should be allowed. An answer based on such a statistical 
analysis is more likely to satisfy the true needs of the average employee 
than any decision which makes no reference to the distributional 
properties of absenteeism. (Note that the Bell System's allowance for 
personal time started after our data were taken.) 

In Section III we suggest a method for assessing absenteeism, based 
on our statistical findings of Section II, and discuss its properties. The 
analysis of Section 2.4 indicates that one year is too short a period to 
decide whether an operator is intrinsically "good," "bad," etc., regard- 
ing attendance. Thus, if management is interested in assessing at- 
tendance as a personal characteristic, the follow-up period needs to be 
longer than one year. The conflict between the viewpoint that past 
years' attendance should not affect the present evaluation (for any 
type of performance rating), and the statistical observation that one 
year is too short a period to assess attendance, are resolved by basing 
our evaluation method (Section III) on two indices. One index rates 
the current year attendance, while the other index rates attendance 
behavior as a personal characteristic, and it depends on the attendance 
during the three most recent years. 

Various aspects of absenteeism have been studied in recent years 
(particularly in the fields of labor relations, applied and industrial 
psychology, and management science). The major contributions of our 
paper to this area of research, and the relation to other studies, as we 
see them, are summarized below: 

(i) We suggest an intuitively appealing method for assessing absen- 
teeism, which reflects the current year attendance, as well as atten- 
dance behavior, as personal characteristics. With suitable modifica- 
tions, the method is adaptable to other occupations. 

{ii) Often in cross-sectional studies a certain sampling bias is intro- 
duced because the sampling is done along the time axis. The detailed 
analysis of Section 2.4 shows how to identify this bias (and in some 
instances how to estimate the underlying model in the presence of this 
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bias). This technique can be of use to other researchers analyzing 
cross-sectional data. (A more detailed paper devoted entirely to statis- 
tical questions that arise in the analysis of this type of data is forth- 
coming.) 

(Hi) In the course of our analysis in Section II, we use some 
graphical techniques that are common tools in exploratory data anal- 
ysis, but are not yet familiar to most social scientists. These tools are 
useful in the tedious chore of identifying patterns and models in large 
data sets, and we hope that exposing them to researchers in the social 
sciences will help make them popular. 

(iv) Throughout the paper we distinguish between two types of 
absences, disability and incidental (definitions in Section II). This 
classification enables us to shed some light on the relation between 
absenteeism and age. Several authors have tried to relate absenteeism 
to age and conflicting findings are often reported. Indeed, in a recent 
study based on a survey of blue-collar production workers, Nicholson 
et al. (Ref. 1, pp. 319-320) report on a marked inverse relation (espe- 
cially for male employees) between absence frequency and age which, 
as they point out, contrasts the conclusions of Porter and Steers 2 (a 
review of the literature on the subject of absenteeism and turnover) 
and Cooper and Payne, 3 that absence frequency increases with age. 
Our data suggest that for telephone operators (all of whom in our 
sample are females) the truth lies somewhere in the middle. That is, 
the frequency of incidental absences is higher for younger operators, 
while the frequency and duration of disability absences is higher for 
older operators. 

For readers who are interested in aspects of absenteeism that are 
not directly related to this work (such as economic, psychological, 
etc.), we include a supplementary reference list (which is by no means 
complete). 

II. DATA ANALYSIS AND STATISTICAL MODELING 
2.1 Introduction 

We distinguish between two types of absences: incidental absences 
(ia), which are usually short, more frequent, and (to a certain extent) 
controllable, and disability absences (da), which are usually long, less 
frequent, and uncontrollable. Formally, a da is any absence that lasts 
six or more days and is due to an illness (an exception is an on-the-job 
accident in which case the da period can be shorter than six days); 
any other absence is defined as an ia. Periods of attendance at work 
will be referred to as showing up (su) periods. 

Our data are made up of attendance records of 112 New England 
Telephone operators, for variable periods t\, • • •, tm. Out of the 112 
records, 6 cover approximately 1 year (between 0.8 and 1.4 years), 63 
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cover approximately 2 years (between 1.6 and 2.4 years), and 43 cover 
approximately 3 years (between 2.5 and 3.1 years). Here we take a year 
to be 240 working days. The attendance records in our sample do not 
usually start, or end, at a beginning of a da, ia, or su period and thus 
two censored (i.e., incomplete) periods typically exist (these are usually 
su periods) for each of the 112 records, one at each end of the record. 
This situation is demonstrated in Fig. 1, which gives a schematic 
example of an attendance record in our data. Note that holidays, 
weekends, vacations, etc., have been deleted from the time axis. The 
large proportion of censored su periods, among the total number of su 
periods, requires special attention and leads to an interesting analysis. 

Frequency of absences, duration of absences, duration of su periods, 
relations between absence and age, etc., are all parts of the complete 
picture of "attendance behavior" of operators. We analyze these vari- 
ables below. In cases where our analysis suggests possible theoretical 
models that can adequately describe the behavior of the variables in 
question, we point out these models. 

Our analysis suggests that operators older than 35 are different from 
operators younger than 35 with regard to certain aspects of absence 
behavior; for the sake of brevity, we refer to the first group as older 
operators and to the second group as younger operators. 

2.2 Duration of lA's 

A histogram of the duration of the 560 observed ia's is given in Fig. 
2a. A simple theoretical model that fits these data to a remarkable 
degree of accuracy is 

/■[duration «"»-.fl-J5r-{ (1 _&*« Uj-la,-, (1) 

with p = p = 346/560 = 0.6179 (note that if JCi, • • •, x n is a random 
sample with a probability density function (pdf) (1), then the maxi- 
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Fig. 1 — A schematic description of an attendance record. Note that the first and last 
§V periods {X] and X n ) are censored (only X\ and X' n are recorded in the sample). 
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DURATION OF IA (DAYS) 



6 OR MORE 



OBSERVED FREQUENCY 
Pj ACCORDING TO (1) 
DIFFERENCE 



0.6179 0.1982 0.0911 0.0518 

0.6179 0.1911 0.0955 0.0478 

0.0071 -0.0044 0.0040 



0.0321 0.0089 

0.0239 0.0239 

0.0082 -0.0150 



DURATION OF IA (DAYS) 



5 OR MORE 



OBSERVED COUNT 346 111 

EXPECTED COUNT, 560 X Pj 346 107 

DIFFERENCE 4 



29 



-2 



23 
?7 
-4 



(h) 

Fi g 2— (a) A histogram of the durations of ia's (avg = 1.72, stdv = 1.16). 
(b) Comparison between the pdf of (1) and the observed durations of ia s. 

mum likelihood and the minimum variance unbiased estimator of p is 
p = Vg., I[Xt = l]/n, where 7[ ] denotes the indicator function; in our 
case this gives p = 346/560). Figure 2b compares the model of (1) with 
the observed data, and the adequacy of the model is transparent. 
Nevertheless, it is interesting to note that a chi-square goodness of fit 
test with size a does not reject the hypothesis that the durations of 
ia's have the pdf (1), even when a is as high as 0.90! 

A random variable, say X, with the pdf (1) has the following 
interesting property: 

P[X=k+j\X>k] = 2~ J , 7 = 1,2,..., A = l,2, .... (2) 

The interpretation of this property, when X stands for the duration of 
an ia, is the following: On the second day of an incidental absence, the 
employee tosses a coin; if the result is heads the employee returns to 
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work on the next day, otherwise he remains absent. The experiment is 
repeated daily until the first time the result is heads, in which case the 
employee returns to work on the following day. Should one try to 
interpret this interesting property, exhibited by the data, in terms of 
human behavior in regard to short absences? 

We remark that the distribution of the duration of ia's for younger 
operators is approximately the same as for older operators and neither 
deviate much from (1). 

2.3 Duration of DA' a 

The following are summary statistics for the 78 da occasions in our 
data: 

lower quartile = 9.0, median = 13.5 upper quartile = 41.0, 

mean (with six most extreme observations removed) = 26.1, 

standard deviation (with six most extreme observations removed) 

= 24.2. 

Out of the 78 observations, 18 were incurred by operators younger 
than 35. Figure 3 compares, by means of box plots, 4 the distributions 
of the duration of da's in the three different cases; younger operators 
(18 da occasions), older operators (60 da occasions), and the combined 
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Fig. 3— Box plots for the duration of da's. 
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sample (78 da occasions). The lower and upper sides of each box are 
the lower and upper quartiles, respectively, and the segment inside the 
box is the median. If d denotes the distance between the quartiles, 
then the box whiskers are drawn to the nearest data value with 1.5d 
from the nearest quartile. Points lying outside this range are plotted 
individually. The figure suggests that long da's are more frequent 
among the older operators. For instance, the upper quartile for the 
older operators is 44.0, with the most extreme observation being 165, 
while the corresponding figures for the younger operators are 32.0 and 
44.0. This difference cannot be accounted for by differences in the total 
sampling durations, because the distribution of the fc's is approximately 
the same for the two age groups. 

2.4 Duration of SU periods 

We use Fig. 1 as a vehicle to explain some basic concepts regarding 
the censoring of su periods. Suppose the length of each individual su 
period (X, in Fig. 1) is distributed according to the cumulative distri- 
bution function (cdf ) F(u). Then, since the probability of any individ- 
ual period that covers the point to is directly proportional to its length 
u, the distribution function of the length of the interval that covers to 
(Xi , in Fig. 1) is H(x) = Jo udF{u)/m (m is a normalizing constant that 
is equal to the mean of F). Given X\, however, the distribution function 
of Xi, which is the observable part of X\ , is uniform on the interval [0, 
Xi] so that (using Bayes' theorem) the unconditional distribution of 
X\ is 



G(y)=P[X' 1 <y] = m- 1 \ [1 - F(u)] du. (3) 

Jo 

Since (3) is usually derived in the context of renewal processes, in 
which case an assumption about the independence of different X,-'s (su 
periods in our application) is built in, it is important to note that this 
assumption is not used in the derivation of (3) (cf. Ref. 5, p. 66), and 
therefore it is not assumed in our discussion. In the analysis that 
follows, however, we assume (unless otherwise stated) that su periods 
of different operators have the same cdf F, as long as they are in the 
same age group. 

It is clear that the argument leading to (3) applies also to X' n , so that 
(3) is the distribution of the censored su periods. An important 
property of (3) is 

G = F if, and only if, F is an exponential distribution 

[i.e., F(x) = 1 - e~ x, \ x > 0, /i > 0]. (4) 

Or, in words, 
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the censored su's and the completed su's have the same 
distribution if, and only if, the distribution 

of the su's is exponential. (4') 

2.4. 1 Analysis for operators younger than 35 

For the younger operators, Fig. 4a compares censored su's with 
completed su's, by means of a Q-Q plot (see Ref. 6, chapter 6). The 
deviation of the plot from a 45-degree line through the origin is not 
very large and for practical purposes one can assume that the censored 
su's and the completed su's follow the same distribution. Being more 
formal, if we test the hypothesis that the two samples have the same 
distribution, using a Wald-Wolfowitz runs test, we observe 92 runs 
while the mean and standard deviation under the null hypothesis are 
99.0 and 6.0, respectively, so that the hypothesis is not rejected at 
significance levels of 0.12 or less. Thus, from (4'), we are led to the 
conclusion that the distribution of the su's is exponential (or, at least, 
that this is an adequate description of the data). Figure 4b is a 
comparison of the combined su sample (censored and completed) 
versus quantiles from exponential distribution. The striking closeness 
to linearity of this plot strongly supports the conclusion that the su's 
are exponentially distributed. The estimated mean of the combined 
sample is 60.2 days, and the standard deviation is 61.2 (which is very 
close to the mean, as is to be expected from a sample from exponential 
distribution). In summary, for the purpose of fitting a parsimonious 
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Fig. 4a — Q-Q plot of censored su's ( Y axis, 65 observations) versus completed su's 
{X axis, 199 observations), for operators younger than 35. 
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2 3 4 5 6 7 

EXPONENTIAL QUANTILES 

Fig. 4b— Q-Q plot of completed and censored su periods ( Y axis, 264 observations) 
versus -log(l - u) (quartiles of standard exponential distribution), for operators younger 
than 35. 

model, one can assume that the periods between consecutive absences, 
for operators of age 35 or less, follow an exponential distribution with 
mean = 60 days. 

2.4.2 Analysis for operators older than 35 

Applying a similar analysis as in the previous case, we point out an 
interesting "data paradox" exhibited by the two su samples (censored 
and completed), and we give possible explanations for this paradoxical 
behavior of the data. 

Figure 5a compares the censored su's with the completed su's. The 
deviation of the Q-Q plot from the 45-degree line through the origin 
is marked, and it is evident, therefore, that the completed and the 
censored su's have different distributions. Specifically, the censored 
su's appear to be stochastically bigger than the completed su's (the 
Q-Q plot is on or above the 45-degree line through the origin) and for 
comparison we look also at their summary statistics: 

(lower quartile, median, upper quartile, mean, stdv) = 
(20.0, 46.0, 88.0, 66.3, 71.2) for the completed su's, and 
(21.0, 47.0, 206.0, 113.3, 130.4) for the censored su's. 

In view of this situation and the assumption that su's of different 
operators have the same cdf, (4' ) suggests that the distribution of the 
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Fig. 5a— Q-Q plot of censored su's ( Y axis, 142 observations) versus completed su's 
(X axis, 333 observations), for operators older than 35. 



completed su's cannot be exponential. Figure 5b, however, in which 
we compare the completed su's with exponential quantiles, points in 
the opposite direction. The closeness to linearity of the Q-Q plot (with 
the exception of the upper 17 points) suggests that the completed su's 
do follow an exponential distribution (or perhaps an exponential with 
a 5-percent contamination). 

We give two possible explanations to this data paradox. The first is 
that intrinsic differences in absence behavior might exist among the 79 
operators of age greater than 35, so that any attempt to fit a single cdf 
to the su periods of these operators is meaningless [in mathematical 
language this means that, in eq. (3), different operators are associated 
with different distribution functions F, while we try to fit a single F], 
and a more complicated model is needed. One possible model is that 
operators can be naturally classified into classes according to their 
attendance behavior (good, bad, etc.). Nevertheless, the sampling 
periods (tt's) in our data are not long enough to enable us to decide 
whether a given operator is intrinsically good, bad, etc., and thus we 
have not pursued this model. 

In the second possible explanation, we show that a certain sampling- 
bias effect could have been the source of our data paradox. Suppose 
the cdf of su periods, which is F of eq. (3), is a mixture of two cdf 's, an 
exponential cdf with mean A, which is small relative to the sampling 
periods t» and a degenerate cdf which assigns a unit mass to a point B 
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Fig. 5b— Q-Q plot of completed su's ( Y axis, 333 observations) versus -log(l - u) 
(quartiles of standard exponential distribution), for operators older than 35. 

which is big relative to the fc's. This means that over a long period of 
time, a certain proportion, say a, of the su's have an exponential 
distribution, while the other su's last a fixed length B. Now any 
sampling period of length t satisfying A «: t < B cannot possibly 
contain a completed su period of length B, so that all the completed 
su's must be from the exponential population and only censored su's 
could possibly be from the B population. In addition to being a model 
that accommodates the "paradoxical" behavior of our data, this model 
provides a useful framework for estimation. Under the model's as- 
sumptions 



P[su > x] ■ 1 - F(x) = ae~ x/A + (1 - a)I[x < B], 



(5) 



where 7[ ] denotes the indicator function, so that, using (5) and (3), 
the moments of the censored su's are 



n = 0, 1, ••-. (6) 



^[censored su]" = X n dG(x) 



a(n+ l)!A n+1 + (l-a)B' 



(aA + (l-a)B)(n+ 1) ' 
Since our model assumes A « t we have, to a good approximation, 

^[completed su] n s x n A' x e~ x,A dx ~ n\A n , n = 0, 1, . . -, (7) 

Jo 



OPERATOR ABSENTEEISM 23 



and therefore the moments method, applied to (6) and (7), gives 

£ [completed su] = 66.3 = A, 

, , 2aA 2 + (1 - a)& 

^censored su] = 113.3 = —^ - (1 _ ^ ) t 

, , 9 6aA 3 + (1 - a)8 3 

^[censored su]> = 29841.05 = 8(git + (1 . atf) > 



(8) 



which yield the estimates A = 66.3, £ = 500.0, a = 0.96. 

Though the above model accommodates the type of behavior dem- 
onstrated by our data, so do other models based on a contaminated 
exponential distribution and the question of finding a model that fits 
our data well has not been answered yet. Toward this end we derived 
a nonparametric estimate of F, denoted F, by tailoring the Kaplan- 
Meier estimator 7 to our application, in which each completed obser- 
vation has to be counted with multiplicity two. (The exact details of 
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Fig. 5c— The dotted line is P(x) = Poider[su < x] (using a modification of the Kaplan- 
Meir estimator, Section 2.4). The solid line is the contaminated exponential distribution 
ofeq. (1). 
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this estimator will be discussed in a separate paper.) The result is 
given in Fig. 5c. The contaminated exponential model 

' < n 0.94(1 - e 70 ) + 0.06| ^ | /[0 < x < 540], x > 0, (9) 



is superimposed on this figure, and it seems to fit the data rather well. 
Figure 5d compares (9) with P(x) by means of a Q-Q plot, and the 
closeness of the plot to the 45-degree line reassures us about the 
adequacy of the model. 

The behavioral interpretation of (9) is that usually (i.e., 94 percent 
of the time) the duration of su periods follows an exponential distri- 
bution with mean 70 days, while occasionally (i.e., 6 percent of the 
time) an su period can be much longer (perhaps 500 to 540 days). We 
note that the average su period, according to (9), is approximately 94 
days, which is substantially bigger than the corresponding number for 
the younger operators (60 days). 
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Fig. 5d— A Q-Q plot of P(x) = P„i d , r [su < x] (using a modification of the Kaplan- 
Meir estimator, Section 2.4) versus the contaminated exponential distribution of 
eq. (10). 
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An important aspect of our data-paradox, and the contaminated 
exponential model, is that it indicates that a follow-up period of one or 
two years is not sufficiently long for evaluating the attendance behavior 
of operators. This observation has implications to our discussion of 
evaluation procedures. 

2.5 Frequency of absences and total time lost (TTL) due to absences 

Figure 6 gives box plots of the frequency of ia's (occasions per year) 
for the two age groups and for the combined sample. (The nonoverlap- 
ping of the notches in the first and second boxes indicates a difference 
at the rough 5-percent significance level between the two medians. 4 ) 
One can immediately see that younger operators tend to have substan- 
tially more ia's. Note again that this difference cannot be accounted 
for by differences in the total sampling durations, because the distri- 
bution of the k's is approximately the same for the two age groups. 

The situation regarding da's is somewhat reversed, as one can see 
from Tables I and II. For example, while the proportion of the younger 
operators in the sample is 29 percent, the proportion of the da 
occasions incurred by them is only 23 percent. We also see in Table II 
that the ratio "ttl due to da" to "ttl due to ia" is 0.022/0.021 for 
younger operators, while it is 0.045/0.014 for older operators. This, 
plus the fact that the probability distribution of the da duration for 
older operators has a substantially longer tail than the corresponding 
quantity for younger operators (Fig. 3), explains the fact that the ttl 
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Fig. 6— Box plots for the frequency of ia (occasions per year). 
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Table I — DA occasions and age 



No. of 

Individuals 

with Occasions 

of DA 



No. of 

Occasions 

of DA 



No. of 

Operators 

in the Entire 

Sample 



Age < 35 
Age > 35 



13 (25%) 
40 (75%) 
53 (100%) 



18 (23%) 
60 (77%) 
78 (100%) 



33 (29%) 

79 (71%) 

112 (100%) 



caused by absences is somewhat higher for older operators (5.9 percent) 
than for younger operators (4.2 percent). 

The first line of Table II shows that a period that starts at the end 
of the ia and ends at the end of the following ia (including the possible 
da's) lasts, on the average, 72 days for younger operators, and 107 days 
for older operators. 

2.6 The number of I A occasions (IAO) over a fixed period of time 

For later applications we want to derive an estimate for the proba- 
bility distribution of the number of iao over a fixed period of time, for 
an arbitrary operator in our sample. To keep the analysis and the 
presentation simple we ignore, for the time being, the differences 
between younger and older operators. Later we will comment on the 
corresponding analysis when the difference in attendance between the 
two age groups is taken into account. It is well known (e.g., Ref. 5, 
p. 104) that under fairly weak assumptions about the statistical behav- 
ior of the periods between consecutive ia's, the quantity yft[N{t)b/t] 
has a limiting distribution as t -» oo. Here N(t) denotes the number of 
iao over a time period of length t and b is the average number of iao 
per unit time. We take this theoretical model as a framework for 

Table II — Age comparison of certain absence characteristics (DA = 
disability absence, IA = incidental absence, TTL = total time lost) 



Age < 35 



total sampling periods* 16601 

total no. of absence occasions 

no. of da occasions 
no. of ia + da occasions 

ttl due to IA 
total sampling periods 

TTL due to DA 
total sampling periods 
ttl due to absences 
total sampling periods 16601 



232 


= 71.56 


18 
232* 


= 0.078 


342 
16601 


= 0.021 


363 
16601 


= 0.022 


705 


= 0.042 





Combined 


Age > 35 


Sample 


43439 , nnM 
-—=106.99 


T- 94.11 

638 


406 


78 
638 "° 122 


620 
43439 = ° 014 


60040 


43439 


2304 
——=0.038 
60040 


2561 „ „ rn 
— - = 0.059 


3266 
— = 0.054 



43439 



60040 



* Total sampling periods = 
(i.e., sum of the t's of Fig. 1). 



the sum of all the observation periods across operators 
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0.5 



1.5 2.0 2.5 

SAMPLING DURATION (til IN YEARS 



3.0 



Fig. 7— A scatter plot of the number of ia occasions, N(ti) (Y axis) versus the 
sampling duration, £, years (X axis), for the 112 operators. 

producing estimates of the distribution of N(t) for a given t. A scatter 
plot of (t if N(ti)), i - 1, • • •, 112, is given in Fig. 7, and one can see 
immediately that VAR(N(t)) increases with t (this is usually referred 
to as heteroscedasticity), as to be expected from our model. Note that 



10 
8 
6 
4 


- 


T 
I 
1 
1 
1 




* 

T 






























-7 




i 

1 




-4 




1 
1 





tj< 2.5 YEARS t|> 2.5 YEARS 

Fig. 8 — Comparison of u, = Jti[N(ti)/ti — b] for small and large values of t, 
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our model implies that the mean and the variance of N(t) are approx- 
imately linear in t, for large values of t. The regression estimate of b in 
the model 

N(t i )l>Jt i = bsfii+ Ui (10) 

is b = 2.24 (with a t value of 15.36). Figure 8 compares (by means of 
box plots) the residuals, 0,'s, of the regression (10) for periods of 
length U < 2.5 years with the £7,'s for periods of length U > 2.5 years. 
The choice of t = 2.5 as a cutoff point seems natural from the 
distribution of t's (see second paragraph of Section 2.1); other t's in 
the neighborhood of 2.5 gives similar results. The comparison shows 
that the heteroscedasticity of the data (Fig. 7) is eliminated and the 
Ui's can be considered as random variables satisfying VAR £7, = 
al > 0, independent of t, so that the empirical distribution of the ft'g 
can be used to estimate the distribution of N(t). 

Theoretically, if (i) our assumptions (e.g., all operators behave 
according to the same probability law) were completely realistic and 
(ii) t were very large, then the distribution of U would be close to a 
normal distribution and we could use this fact to estimate the distri- 
bution of N(t). Since, however, neither (i) or (ii) is entirely correct, 
we do not rely on the asymptotic normality of U. Instead we use the 
empirical distribution of the Ui's as an estimate of the distribution of 
U, and hence obtain an estimate for the distribution of bt + yftU. In 
practice, however, since N(t ) is restricted to the nonnegative integers, 

Table III— P[N(t) = j], 

estimated probability that 

the number of IA 



occasions, 


over a 


period 


of length t, equals j 


K< 


1 year 


2 years 


3 years 





0.29 


0.09 


0.01 


1 


0.13 


0.16 


0.08 


2 


0.14 


0.11 


0.08 


3 


0.12 


0.07 


0.09 


4 


0.17 


0.11 


0.11 


5 


0.08 


0.07 


0.07 


6 


0.03 


0.10 


0.07 


7 


0.02 


0.12 


0.10 


8 


0.02 


0.07 


0.05 


9 


0.0 


0.04 


0.06 


10 


0.0 


0.02 


0.13 


11 


0.0 


0.02 


0.05 


12 


0.0 


0.02 


0.04 


13 


0.0 


0.0 


0.02 


14 


0.0 


0.0 


0.02 


15 


0.0 


0.0 


0.02 


Total 


1.00 


1.00 


1.00 
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we look at 

nfc(f) = max{0, [bt + >TtUi+ %]}, (11) 

where [x] denotes the integer part of x, and we estimate the distribu- 
tion of N(t) by 

p,(j) m P[N(t) =j] = (number of rm(t) =y)/112. (12) 

Table III gives P t (j) for t = 1, 2, 3, years. 

Comment: In view of the difference between younger and older 
operators, it would have been more appropriate to estimate P[N(t) 
= j] separately for younger and for older operators, and then to use 
their relative weights in the entire sample to obtain a final estimate. 
That is, 

P[N(t) =j] -^lWr[W) -/] + — Pdd-[NU) mJl 

The actual values of the estimates using this method are close to the 
values of the estimates we obtained (Table III) without partitioning 
the sample, and therefore we do not give the details of this calculation. 
Note that the estimates of (12) are motivated by a model that imposes 

Table IV— P[L(f) < /], 
estimated probability that 
the TTL from lA's, over a 

period of length t, is at 
most j days 



\ 


1 year 


2 years 


3 years 





0.29 


0.09 


0.01 


1 


0.37 


0.19 


0.06 


2 


0.45 


0.26 


0.11 


3 


0.52 


0.32 


0.16 


4 


0.60 


0.38 


0.21 


5 


0.68 


0.43 


0.26 


6 


0.75 


0.48 


0.31 


7 


0.81 


0.53 


0.36 


8 


0.86 


0.58 


0.41 


9 


0.90 


0.63 


0.46 


10 


0.93 


0.68 


0.50 


n 


0.95 


0.73 


0.54 


12 


0.97 


0.77 


0.58 


13 


0.98 


0.81 


0.62 


14 


0.99 


0.84 


0.66 


15 


1.00 


0.87 


0.70 


1G 




0.89 


0.74 


17 




0.91 


0.78 


18 




0.93 


0.81 


lit 




0.95 


0.84 


20 




0.96 


0.87 


21 




0.97 


0.90 


11 




0.98 


0.92 


12 




0.99 


0.94 


24 




1.00 


0.96 
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very few assumptions on the data. Other possible frameworks for 
estimation, which impose more conditions on the data (e.g., Poisson 
arrivals of the ia's) result in estimates for which we feel that the 
assumptions, rather than the data, determine the actual values of the 
estimates. However, with more absenteeism data (in particular, longer 
ti's) it is possible to identify a useful parametric model for estimating 
P[N(t) =j]. 

Let L(t) denote the ttl from the N(t) occasions of ia. Clearly 



L(t)=X l +X 2 + •■■ +X N{I) , (13) 

where X, denotes the duration of the ith ia. Assuming that the Xt'a are 
independent of N(t), we have 



P[N(t) = n, L(t) = Z] = P 



lXi = l 



i 



P[N(t) = n]. (14) 



Combining the estimates (12) and (14) with (15), we obtain the joint 
probabilities of N(t) and L{t) for t = 1, 2, 3 years. The marginal 
distributions of N(t) and L(t) (Tables III and IV, respectively) are 
then used to construct Tables Va, b, and c, as described below. 

2. 7 Constructing Tables Va, b, and c 

Tables Va, b, and c are the building blocks of our proposed evalua- 
tion procedure (Section III) and understanding their construction 
enables the user to interpret the ratings R a ,Rb, and R c which make up 
the attendance evaluation scheme. 

To each possible value of N(l), the number of ia's in a single year, 
and to each possible value of L(l), the total number of days lost in 
these ia's, we attach a grade and a score. Values of N(l) which lie in 
the lower 5 percent of the distribution of 7V(1), which is given in Table 

Table Va — Scoring table on the basis of one-year attendance 

Number of Occasions 










1 


2 


3 


4 


5 


6 











100 














E 


100 




1 




74 












G 


74 




2 




68 


55 










G 


62 




3 




60 


49 


43 








F 


49 


1 


4 




56 


46 


40 


32 






F 


43 


5 




52 


43 


37 


30 


23 




F 


37 


"8 


6 




48 


39 


34 


27 


21 





F 


31 


« 


7 




42 


34 


30 


24 


18 





P 


24 


Ja 


8 




38 


31 


27 


22 


17 





P 


20 


| 


9 




34 


28 


24 


20 


15 





P 


16 


z 


10 




30 


24 


21 


17 


13 





P 


12 




11 




24 


20 


17 


14 


11 





P 


8 




12 






















U 









K 


G 


F 


F 


P 


P 


u 










100 


74 


49 


37 


24 


14 
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Table Vb- 


-Scoring table 


on 


theb 


asis 


of tw 


D-years a 


tten 


danc 


.e 














Number of Occasions 














K 





l 


2 


3 


4 


5 


6 


7 


8 


9 


10 











100 






















E 


100 




1 




94 




















V 


94 




2 




83 


74 


















G 


74 




3 




81 


71 


65 
















G 


69 




4 




78 


69 


63 


56 














G 


64 




5 




74 


66 


60 


54 


49 












G 


59 




6 




71 


63 


58 


51 


47 


42 










G 


54 




7 




68 


60 


55 


49 


45 


40 


34 








F 


49 




8 




64 


57 


52 


46 


42 


38 


32 


28 






F 


44 


g 


9 




61 


54 


49 


44 


40 


36 


31 


26 


22 




F 


39 


5 


10 




57 


50 


46 


41 


37 


33 


29 


25 


20 





F 


34 


Im 


11 




52 


46 


42 


38 


34 


31 


26 


23 


19 





F 


29 


o 

u 


12 




47 


42 


39 


34 


31 


28 


24 


21 


17 





P 


24 


0) 
X3 


13 




45 


40 


37 


33 


30 


27 


23 


20 


16 





P 


22 


£ 


14 




43 


38 


35 


31 


29 


26 


22 


19 


15 





P 


20 


z 


15 




41 


36 


33 


30 


27 


24 


21 


18 


15 





P 


18 


16 




39 


34 


31 


28 


26 


23 


20 


17 


14 





P 


16 




17 




36 


32 


29 


26 


24 


21 


18 


16 


13 





P 


14 




18 




34 


30 


27 


24 


22 


20 


17 


15 


12 





P 


12 




19 




31 


27 


25 


22 


20 


18 


15 


13 


11 





P 


10 




20 


































U 









E 


V 


G 


G 


F 


F 


F 


P 


P 


P 


u 










100 


94 


74 


62 


49 


41 


33 


24 


18 


12 







. 



Ill, are given the grade Excellent and their scores vary between 100 
and 95; values of N(l) which lie between the 6th and the 25th percentile 
of the distribution of N(l) are given the grade Very Good and their 
scores vary between 94 and 75, etc. [The particular score depends on 
how many values of N(l) fall in this range. For example, if only one 
value of N(l) lies between the 6th and 25th percentile, its score is 94 
(e.g., the rightmost column of Table Vb); if there are two values, their 
scores are 94 and 84 = 94 - (Vfe)(94 - 75) (e.g., the lower-most row of 
Table Vc); if there are three values, they get the scores 94, 88 = 94 - 
(i/ 3 )(94 _ 75) and 81 = 94 - (%)(94 - 75), and so on.] We treat L(l) 
similarly, using the estimated distribution in Table VI. Table VI gives 
the details of the grading and scoring method, and it is used for N(t) 

and L(t) 9 1 - 1, 2, 3. 

An exception to Table VI is made when N(t) =0 [and hence L(t) 
= 0], in which case the grade is Excellent and the score is 100 regardless 
of whether is in the lower 5 percent of the distribution of N(t) [note 
that P(N(1) = 0) = 0.29 and P(N(2) = 0.09); see Table III]. The scores 
(and grades) associated with each value of N(l) and L(l) are written 
on the margins of Table Va and each entry in the body of the table is 
the geometric mean of the marginal scores; for example, R a (N{l) = 3, 
£(1) = 4) = V37 X 43 = 40. The reason for choosing the geometric 
mean to combine the marginal scores is to achieve the desirable shape 
of the equicontours of the resulting table. More specifically, we observe 
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the following attractive properties: (i) The ratings decrease along the 
west-east and north-south directions, (ii) Each entry in the table is 
slightly greater than (or equal to) its north-east neighboring entry. 
(This implies that a reduction in the number of occasions of ia's is 
desirable even at the expense of a slight increase in the ttl.) {Hi) An 
operator is rated Unsatisfactory whenever at least one margin is rated 
as such. 

Tables Vb and Vc are constructed similarly with the obvious sub- 
stitutions of (N(2), L(2)) and (JV(3), L(3)) for (N(l), L(l)). 

III. EVALUATING ATTENDANCE AT WORK 
3.1 Introduction 

The attendance behavior of an operator is one of the most important 
components in the operator's overall performance, so it is evaluated 
regularly. In particular, it is weighed very carefully when the operator 
is considered for a transfer or promotion. So far, however, attendance 
has been assessed in local terms (compared to other operators in the 
office) and naturally this is done in a subjective and informal way. 
Though the informality is an advantage both for management and 
employees, this is not so for the subjectivity of the evaluation. A 



Table Vc — Scoring table on the basis of three-years attendance 

Number of Occasions 





Nj ° 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 








100 




























E 


100 




1 


94 


























V 


94 




2 


91 


86 
























V 


89 




3 


89 


84 


79 






















V 


84 




4 


86 


81 


76 


72 




















V 


79 




5 


83 


79 


74 


70 


66 


















G 


74 




6 


81 


77 


72 


68 


64 


60 
















G 


70 




7 


79 


74 


70 


66 


62 


57 


53 














G 


66 




8 


76 


72 


68 


64 


60 


55 


52 


48 












G 


62 




9 


74 


70 


66 


62 


58 


53 


50 


46 


42 










G 


58 


>, 


10 


71 


67 


63 


60 


56 


51 


48 


45 


41 


36 








G 


54 


CO 

T3 


11 


68 


64 


60 


57 


53 


49 


46 


43 


39 


34 


30 






F 


49 


— 


12 


65 


61 


58 


54 


51 


47 


44 


41 


37 


33 


28 


23 




F 


45 


C 

fc 


13 


62 


59 


55 


52 


49 


45 


42 


39 


36 


31 


27 


22 





F 


41 


0) 


14 


59 


56 


52 


49 


46 


43 


40 


37 


34 


30 


26 


21 





F 


37 


E 


15 


56 


53 


49 


47 


44 


40 


38 


35 


32 


28 


24 


20 





F 


33 


3 

z 


16 


52 


49 


46 


44 


41 


38 


35 


33 


30 


26 


23 


19 





F 


29 


17 


47 


45 


42 


40 


37 


34 


32 


30 


27 


24 


21 


17 





P 


24 




18 


44 


42 


39 


37 


32 


35 


30 


28 


26 


22 


19 


16 





P 


21 




19 


41 


39 


36 


34 


32 


30 


28 


26 


24 


21 


18 


15 





P 


18 




20 


38 


35 


33 


31 


29 


27 


25 


24 


22 


19 


16 


13 





P 


15 




21 


34 


32 


30 


28 


26 


24 


23 


21 


19 


17 


15 


12 





P 


12 




22 


29 


27 


26 


24 


23 


21 


20 


18 


17 


15 


13 


10 





P 


9 




23 


24 


22 


21 


20 


19 


17 


16 


15 


14 


12 


10 


8 





P 


6 




24 









































U 







E 


V 


V 


G 


G 


G 


F 


F 


F 


F 


P 


P 


P 


u 








100 


94 


84 


74 


66 


58 


49 


43 


37 


31 


24 


18 


12 
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Table VI — Grades and scores for N(t) and L(t) 



Percentile 






range 


Grade 


Score Range 


0-5 


Excellent 


100-95 


6-25 


Very-good 


94-75 


26-50 


Good 


74-50 


51-75 


Fan- 


49-25 


76-95 


Poor 


24-5 


96-100 


Unsatisfactory 






scheme that allows an objective and consistent evaluation of atten- 
dance would be of potential use to line management. 

In Section II we studied in detail the statistical aspects of absentee- 
ism. Our analysis, in particular Section 2.4, suggested that if one is 
interested in attendance behavior as a personal characteristic, then 
one year is too short for evaluating it. The far past, on the other hand, 
bears little relevance to recent attendance behavior and thus should 
not be included in the attendance evaluation. In this section we suggest 
an evaluation method based on the present and near past (three most 
recent years) that reflects the current year attendance as well as 
attendance behavior in a more general sense. We recall, however, from 
the analysis of Section II that disability absences (da's) are intrinsi- 
cally different from incidental absences (ia's). The high variation in 
the distribution of the duration of da's and their low frequency of 
occurrences make it hard to give meaningful statistical guidelines as to 
what can be considered good, bad, etc., behavior regarding da's. 
Furthermore, management can do practically nothing to control da's. 
We therefore base our attendance evaluation on ia's only. 

In a sensitive issue such as absenteeism from work, the numerical 
values of the attendance rating do not always tell the whole story. Any 
method for evaluation might occasionally misjudge good employees, if 
it is used in a formal and rigid manner. Thus, the best way to avoid 
these effects is to use it as an informal tool. One has to keep in mind 
that for every absence there is a reason, and these reasons are not 
reflected in the formal attendance ratings. 

3.2 The evaluation procedure 

The proposed scheme is best explained with an example. Consider 
an operator who started to work on January 1970 and whose IA 
occurrences and total time lost (ttl) are given in Table VII. 

Table VII— Record of lA's 



Year 


1970 

1 
1 


1971 


1972 


1973 


1974 


1975 


1976 


1977 


1978 


No. of ia occasions 
ttl due to ia's 


4 

6 


2 
3 










4 

5 


2 
4 






1 
1 
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To evaluate the operator's attendance we propose the following^ 
simple procedure: 

(i) Determine the first three lines of Table VIII as follows: For each 
year use Table VII to calculate N(t) and L(t) (t = 1, 2, 3), the number 
of ia occasions, and the ttl due to ia's during the t most recent years, 
respectively. 

(ii) For each year, read the ratings associated with (iV(l), L(D), 
(N{2), L(2)), and (2V(3), L(3)) from Tables Va, b, and c. These ratings 
are written in lines 4, 5, and 6 of Table VIII, respectively, and their 
interpretation [in terms of percentiles of the marginal distributions of 
N(t) and L(t), t - 1, 2, 3] is described in Section 2.7. 

(#l) Determine line 7 of Table VIII, the attendance index of theyth 
year, according to the following formula: 



».-J aV 

1 1 max{#,- 



iVg(Raj, RbJ, Rcj), if RaJ-X > RaA 

lj-1, &Vg(R a J, RbJ, Rcj)}, if RaJ-1 < Raj) 



For example, in calculating #1975 we first compare R a ,m5 with Ra.wn- 
Since Ra.iw = 100 > 30 = #0,1975, we take #1975 = avg( #0,1975, #6.1975, 
fic.1975) = (30 + 54 + 70)/3 = 51. On the other hand, in calculating 
i?i97 6 , comparing .ft a , 1975 with # a ,i976 shows that #0,1975 = 30 < 46 = 
#0,1976, so that i? 19 76 = max {51, (46 + 36 + 53)/3} - max {51, 45} = 51. 
(iv) The formal evaluation consists of two indices, R a (line 4) which 
is the current year rating and R (line 7) which can be considered as an 
index for attendance behavior (here we view attendance behavior as a 
personal characteristic of the operator), or in short attendance index. 

3.3 Properties of the proposed procedure 

(i) While the current year rating, R a , reflects the attendance in the 
most recent year, the attendance index, R, takes the near past into 
account, enabling the operator to build up credit. For instance, while 
1975 itself was a Fair year (R a = 30), in the example of Table VIII the 
attendance index, R, for 1975 was Good (R = 51). This is due to the 
perfect attendance during the previous two years. And indeed, if in 
1975 this operator was considered for promotion, then the score 51 is 
a better indicator of her attendance behavior (considered as a personal 
characteristic) than her current year rating of 30. Similarly, the effect 
of bad attendance cannot be entirely erased in a single year of perfect 
attendance, as can easily be seen in the years 1972 and 1973. 

By the nature of its definition, R is much smoother than R a and is 
a better indicator of attendance. To reemphasize this point, consider 
an operator whose attendance record fluctuates from (N(l) = 0, L(l) 
= 0) to (N(l) = 6, L(l) = 12) to (N(l) - 0, L(l) = 0), • • • , etc. The 
current-year rating then fluctuates from R a = 100 (Excellent) to R a = 
(Unsatisfactory) while the attendance index fluctuates from R = 58 
(Good) to R = 9 (Poor), which seems more appropriate overall. 
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(ii) If R a j-i < R a j, then Rj-i < Rj, or, in other words, if the current 
year rating has improved, then the attendance index will not decrease. 
This follows from the definition of R and is done to avoid negative 
reinforcement. The situation is exemplified in moving from 1975 to 
1976 in Table VIII. There we have 2? Q .i976 = 46 > 30 = R a ,m5 (an 
improvement in the current-year rating), so we take #1976 = 51 = #1975 
despite the fact that avg(i? a .i976, Rb.me, Rcme) = 45 < 51. 

Since the procedure allows the operator to build up credit, the 
reverse situation does not hold and one can have R a j-i > Raj (dete- 
rioration in the current-year rating) with Rj-i < Rj (improvement in 
the attendance index). This is exemplified in the ratings of 1977 and 
1978 in Table VIII. And indeed, even though the attendance in 1978 
was worse than the attendance in 1977, the period 1976-1978 as a 
whole reflects better attendance than the period 1975-1977. 

IV. REMARKS 

(i) Though the technical details (such as the length of the periods 
to be used for rating, and the specific values in Tables Va, b, and c) are 
tuned to telephone operators (more specifically to our sample), th*» 
method itself can be adapted to other occupations. In occupations with 
substantially higher absence rate, such as auto workers [see, for 
example, the data collected from 60 blue-collar employees of an auto- 
mobile-parts foundry, reported in Morgan and Herman (Ref. 8, pp. 
739)], periods of 1, 2, and 3 years are too far in the past to affect the 
current attendance index and should be replaced with shorter periods 
(e.g., 6, 12, and 18 months). 

(ii) As pointed out by a Bell Laboratories referee, the choice of the 
scoring bands in Table VI is somewhat arbitrary, and these bands 
differ from the holu (high, objective, low, unsatisfactory) bands that 
were recommended by the AT&T Measurements Task Force. Since, 
however, our main contribution here is the general approach for 
evaluating attendance (i.e., weighing the recent past in the attendance 
index) rather than the particular details, we prefer to leave the expo- 
sition as is. 

(Hi) In view of the difference between younger and older operators 
in regard to ia's, note that the proposed scheme is tuned to a popula- 
tion of approximately 30-percent younger operators and 70-percent 
older operators (as in our sample). This proportion emphasizes the 
better behavior of the older operators without setting unattainable 
standards for the younger operators. 
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